Let activities heartbeat during worker shutdown#2903
Conversation
maciejdudko
left a comment
There was a problem hiding this comment.
Hi @baekgyu-kim, thank you for your contribution! It's great to see someone taking on these long standing issues. However, this is not the right implementation.
There should be a new worker option to enable heartbeating during shutdown, It should default to disabled, and when disabled, the behavior should be identical to existing behavior for backward compatibility purposes.
When the option is enabled, the heartbeat behavior should be identical to what happens during normal heartbeat when the worker is not shutting down. There should be no additional code path that calls sendHeartbeatRequest a different way, the existing mechanism should be used. The way to achieve that is to modify SyncActivityWorker.shutdown so that heartbeatExecutor.shutdown is only called after all outstanding activity tasks have finished executing.
If you need assistance with implementation, feel free to reach out on community Slack, either message me directly or post on #java-sdk channel.
|
Hi @maciejdudko, It now adds an experimental Whenever you have a chance, I'd appreciate another look. Thanks again! |
| private PollerBehavior workflowTaskPollersBehavior; | ||
| private PollerBehavior activityTaskPollersBehavior; | ||
| private PollerBehavior nexusTaskPollersBehavior; | ||
| private boolean activityHeartbeatDuringShutdown; |
There was a problem hiding this comment.
The field should be named allowActivityHeartbeatDuringShutdown, the options getter should be named getAllowActivityHeartbeatDuringShutdown, and the builder setter should be named setAllowActivityHeartbeatDuringShutdown. Apply this change consistently throughout the PR.
| return null; | ||
| }); | ||
| CompletableFuture<Void> shutdownFuture; | ||
| if (activityHeartbeatDuringShutdown) { |
There was a problem hiding this comment.
When interruptTasks is true (shutdownNow was called instead of shutdown), it should behave as if heartbeat during shutdown was disabled.
| if (activityHeartbeatDuringShutdown) { | |
| if (allowActivityHeartbeatDuringShutdown && !interruptTasks) { |
| * io.temporal.client.ActivityWorkerShutdownException}, unless {@link | ||
| * WorkerOptions.Builder#setActivityHeartbeatDuringShutdown(boolean)} is enabled, in which case | ||
| * heartbeats keep working until the activity tasks finish executing.<br> |
There was a problem hiding this comment.
shutdownNow behavior stays the same, see comment in SyncActivityWorker.
| * io.temporal.client.ActivityWorkerShutdownException}, unless {@link | |
| * WorkerOptions.Builder#setActivityHeartbeatDuringShutdown(boolean)} is enabled, in which case | |
| * heartbeats keep working until the activity tasks finish executing.<br> | |
| * io.temporal.client.ActivityWorkerShutdownException}.<br> |
| /** | ||
| * If enabled, activities can keep heartbeating while the worker is shutting down. The activity | ||
| * heartbeat executor is closed only after all outstanding activity tasks have finished | ||
| * executing, so {@link io.temporal.activity.ActivityExecutionContext#heartbeat(Object)} behaves | ||
| * exactly as it does while the worker is running: heartbeats are throttled and sent to the | ||
| * server, which keeps the server from timing the activity out during the {@link | ||
| * WorkerFactory#awaitTermination(long, java.util.concurrent.TimeUnit)} grace period. | ||
| * | ||
| * <p>Note that with this option enabled activities are no longer notified of the worker | ||
| * shutdown by an {@link io.temporal.client.ActivityWorkerShutdownException} thrown from {@code | ||
| * heartbeat}, so they are expected to complete within the termination grace period on their | ||
| * own. | ||
| * | ||
| * <p>Defaults to false, meaning that after shutdown is requested, {@link | ||
| * io.temporal.activity.ActivityExecutionContext#heartbeat(Object)} stops sending heartbeats and | ||
| * throws {@link io.temporal.client.ActivityWorkerShutdownException}. | ||
| */ | ||
| @Experimental | ||
| public Builder setActivityHeartbeatDuringShutdown(boolean activityHeartbeatDuringShutdown) { |
There was a problem hiding this comment.
We don't want to document implementation details.
| /** | |
| * If enabled, activities can keep heartbeating while the worker is shutting down. The activity | |
| * heartbeat executor is closed only after all outstanding activity tasks have finished | |
| * executing, so {@link io.temporal.activity.ActivityExecutionContext#heartbeat(Object)} behaves | |
| * exactly as it does while the worker is running: heartbeats are throttled and sent to the | |
| * server, which keeps the server from timing the activity out during the {@link | |
| * WorkerFactory#awaitTermination(long, java.util.concurrent.TimeUnit)} grace period. | |
| * | |
| * <p>Note that with this option enabled activities are no longer notified of the worker | |
| * shutdown by an {@link io.temporal.client.ActivityWorkerShutdownException} thrown from {@code | |
| * heartbeat}, so they are expected to complete within the termination grace period on their | |
| * own. | |
| * | |
| * <p>Defaults to false, meaning that after shutdown is requested, {@link | |
| * io.temporal.activity.ActivityExecutionContext#heartbeat(Object)} stops sending heartbeats and | |
| * throws {@link io.temporal.client.ActivityWorkerShutdownException}. | |
| */ | |
| @Experimental | |
| public Builder setActivityHeartbeatDuringShutdown(boolean activityHeartbeatDuringShutdown) { | |
| /** | |
| * If true, activities can keep heartbeating during graceful worker shutdown (see {@link | |
| * io.temporal.worker.WorkerFactory#shutdown WorkerFactory.shutdown}). Defaults to false, | |
| * which means that after graceful shutdown is requested, calling {@link | |
| * io.temporal.activity.ActivityExecutionContext#heartbeat ActivityExecutionContext.heartbeat} | |
| * does not send a heartbeat and instead throws {@link | |
| * io.temporal.client.ActivityWorkerShutdownException ActivityWorkerShutdownException}. This | |
| * option is ignored by non-graceful shutdown (see {@link | |
| * io.temporal.worker.WorkerFactory#shutdownNow WorkerFactory.shutdownNow}). | |
| * | |
| * <p>Note that with this option enabled, activities are no longer notified of the worker | |
| * shutdown by the {@link io.temporal.client.ActivityWorkerShutdownException | |
| * ActivityWorkerShutdownException} exception, so they are expected to complete within the | |
| * termination grace period on their own. | |
| */ | |
| @Experimental | |
| public Builder setAllowActivityHeartbeatDuringShutdown(boolean allowActivityHeartbeatDuringShutdown) { |
| WorkflowExecution execution = WorkflowClient.start(workflow::execute); | ||
| started.get(); | ||
| testWorkflowRule.getTestEnvironment().shutdown(); |
There was a problem hiding this comment.
There's a race condition here - shutdown() call can go through before activity worker receives the task, which will prevent the activity from running and the test will fail.
This feature will be easier to test using a standalone activity. It should work like this:
- Test starts activity.
- Test blocks on a semaphore 1 until activity starts.
- Activity signals semaphore 1.
- Activity blocks on semaphore 2 until shutdown is triggered.
- Test calls
shutdown(). - Test signals semaphore 2.
- Activity heartbeats then returns. (An exception will fail the activity.)
- Test calls
result()on activity handle to ensure it succeeded. (Failure will throw exception and fail the test.)
| * ActivityWorkerShutdownException}. | ||
| */ | ||
| @Test | ||
| public void testHeartbeatingActivityCompletesDuringShutdown() |
There was a problem hiding this comment.
Also add a test case for when shutdownNow is called instead of shutdown.
What was changed
heartbeatExecutoralready shut down),HeartbeatContextImpl.heartbeat()now emits the heartbeat to the server before throwingActivityWorkerShutdownException(previously: thrown without sending).heartbeat()cannot flood the server.ActivityWorkerShutdownExceptionis always thrown, so a transient failure cannot mask the shutdown signal.ActivityWorkerShutdownExceptionJavadoc updated accordingly.Why?
heartbeat()threw immediately without contacting the server → no heartbeat during theawaitTerminationgrace period → server times the activity out and retries it → duplicate executions, despite the worker deliberately giving the activity time to finish.Checklist
Closes Add the ability to keep heartbeating while the worker is shutting down #2075
How was this tested:
New
HeartbeatContextImplTestcases:Any docs updates needed?
ActivityWorkerShutdownExceptionJavadoc.